Awk is a powerful language to manipulate and process text files. It is especially helpful when the lines in a text files are in a record format. i.e A record containing multiple fields separated by a delimiter. Even when the input file is not in a record format, you can still use awk to do some basic file and data processing. You can also write programming logic using awk even when there are no input files that needs to be processed.
In short, AWK is a powerful language, that can come in handy to do daily routine jobs.
If you are new to awk, start by reading this Awk introduction tutorial that is part of the Awk tutorial series.
Learning curve on AWK is much smaller than the learning curve on any other languages. If you know C program already, you’ll appreciate how simple and easy it is to learn AWK.
AWK was originally written by three developers — A. Aho, B. W. Kernighan and P. Weinberger. So, the name AWK came from the initials of those three developers.
The following are the three variations of AWK:
1. Awk
AWK is original AWK written by A. Aho, B. W. Kernighan and P. Weinberger.
2. Nawk
NAWK stands for “New AWK”. This is AT&T’s version of the Awk.
3. Gawk
GAWK stands for “GNU AWK”. All Linux distributions comes with GAWK. This is fully compatible with AWK and NAWK.
On Linux, typing either awk or gawk invokes the GAWK. awk is linked to gawk as shown below on Linux systems.
# ls -l /bin/awk /usr/bin/awk lrwxrwxrwx 1 root root 4 Jan 5 23:13 /bin/awk -> gawk lrwxrwxrwx 1 root root 14 Jan 5 23:13 /usr/bin/awk -> ../../bin/gawk
The following table summarizes the different features that are available in these versions. As you see below, gawk is the superset that contains all the features of original awk and nawk.
Awk Vs Nawk Vs Gawk
Download the Awk Vs Nawk Vs Gawk differences in PDF cheatsheet format.
The following basic built-in variables FS, OFS, RS, ORS, NR, NF, and FILENAME are available in all versions of awk.
Feature | Description | AWK | NAWK | GAWK |
---|---|---|---|---|
FS | Input field separator | Yes | Yes | Yes |
OFS | Output field separator | Yes | Yes | Yes |
RS | Record separator | Yes | Yes | Yes |
ORS | Output record separator | Yes | Yes | Yes |
NR | Number of the record | Yes | Yes | Yes |
NF | Number of fields in a record | Yes | Yes | Yes |
FILENAME | Contains current input-file that is getting processed | Yes | Yes | Yes |
All the following features are not available in the original awk. They are available in nawk and/or gawk as shown below.
Feature | Description | NAWK | GAWK |
---|---|---|---|
FNR | File “Number of the record” | Yes | Yes |
ARGC | Total number or arguments passed to awk script | Yes | Yes |
ARGV | Array containing all awk script arguments | Yes | Yes |
ARGIND | Index to ARGV to retrieve the current file name | Yes | |
SUBSEP | Subscript separator for array indexes | Yes | Yes |
RSTART | Match function sets RSTART with the starting location of str1 in str2 | Yes | Yes |
RLENGTH | Match function sets RLENGTH with length of the str1 | Yes | Yes |
OFMT | Awk uses this to decide how to print values. Default is “%.6g” | Yes | Yes |
ENVIRON | Array containing all environment variables and values | Yes | |
IGNORECASE | Default is 0. When set to 1, it is case insensitive for string and reg-ex comparisons. | Yes | |
ERRNO | Contains error message of an I/O operation. e.g. while using getline function. | Yes | |
BINMODE n | Set binary mode for I/O. n can be 1 (input files), 2(output files), or 3(all files) | Yes | |
CONVFMT | The format used while converting number to string. | Yes | |
FIELDWIDTHS n | n is a space delimited number that indicates the column widths. If this is available, gawk uses this instead of FS. | Yes | |
LINT n | n can be a number. When n is a nonzero number (indicating true), gawk will displays fatal, invalid, or warning lint messages (same as –lint command line) | Yes | |
TEXTDOMAIN | This is used for internationalization. | Yes | |
sub(str1,str2,var) | In the input string (var), str1 is replaced with str2, and output is stored back in var | Yes | Yes |
gsub(str1,str2,var) | Same as sub, but global. It does multiple substitutions on the same input string (var). | Yes | Yes |
match(str1,str2) | Returns positive number when str1 is present in str2. | Yes | Yes |
getline < file | Read next line from another input-file. Sets $0, NF | Yes | Yes |
getline var < file | Read next line from another input-file and store it in variable (var) | Yes | Yes |
toupper(str) | Converts str to upper-case | Yes | |
tolower(str) | Converts str to lower-case | Yes | |
|& | Two way communication between awk command and external process | Yes | |
systime() | Current time in epoch time. Combine with strftime. e.g. print strftime(“%c”,systime()) | Yes |
Comments on this entry are closed.
No mention is made of mawk in your article. it is smaller and fastger than gawk but has limits on nf and sprintf buffer size.
awk, gawk, and nawk have varying support for regular expressions. IIUC, awk supports grep regular expressions, while gawk supports egrep regular expressions. I’m not sure how nawk fits in here.
Also sockets… AFAIK only nawk has socket support.
re: GAWK: it’s GNU/Linux distributions, not Linux distributions. A clue is in the fact that the GNU version of AWK is in all of them.